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Abstract 

We introduce BruNet, a general P2P software frame- 
work which we use to produce the first implementation 
of Symphony, a 1-D Kleinberg small- world architecture. 
Our framework is designed to easily implement and mea- 
sure different P2P protocols over different transport lay- 
ers such as TCP or UDP This paper discusses our im- 
plementation of the Symphony network, which allows 
each node to keep k < log N shortcut connections and 
to route to any other node with a short average delay 
of 0(j log 2 N). We present experimental results taken 
from several PlanetLab deployments of size up to 1060 
nodes. These successful deployments represent some of 
the largest PlanetLab deployments of P2P overlays found 
in the literature, and show our implementation's robust- 
ness to massive node dynamics in a WAN environment. 

1 Introduction: Motivation and Summary 
of Results 

Peer-To-Peer (P2P) networking is an increasingly pop- 
ular network model where nodes communicate directly 
without utilizing a centralized server. In recent years, 
P2P file-sharing applications have flourished. A recent 
study shows that P2P systems are responsible for ap- 
proximately one half of the network traffic at a major 
university[l] and comprise a significant fraction of total 
Internet traffic. For a review of P2P search systems, see 
[2]. 

There are three novel contributions reported in this 
work. First, we describe a new P2P software framework 
called BruNet. The BruNet framework handles most of 
the issues common to all P2P protocols such as dealing 
with firewalls and NATs, connecting nodes, and routing 
packets. Secondly, we use the BruNet P2P framework 
to implement Symphony[3], a 1-D Kleinberg routable 
small-world network[4, 5], This is the first implemen- 
tation of a 1-D routable small-world network. Third, we 
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report on large scale PlanetLab tests involving more than 
1000 nodes, which puts the P2P networks described here 
amongst the largest P2P networks to be tested on Plan- 
etLab. These deployments demonstrate our implemen- 
tation's robustness to massive node dynamics in a WAN 
environment. 

Our BruNet software architecture manages P2P packet 
routing and connection maintenance. Given a packet 
with a particular destination address A, the system will 
deliver the packet to the node closest to that address. 
This sort of routing primitive may be used to build a dis- 
tributed hash table (DHT), which is common in the P2P 
literature. Clearly, the success and efficacy of such an 
ad-hoc addressing and routing scheme depends on the 
robustness of the overlay structured networks. 

The deployment of DHT P2P systems such as the 
Kademlia-based[6] eDonkey, which already supports 
about a million simultaneous users, indicates that large- 
scale overlay networks are feasible. The existence of 
such large-scale DHT systems is impressive, however 
the performance of P2P networks at that scale has not 
yet been systematically studied. While we have not yet 
scaled to one million nodes, our experiments of more 
than 1,000 nodes is amongst largest P2P networks to be 
tested on PlanetLab. The data we obtained from deploy- 
ments of our system on PlanetLab show that the struc- 
tured routing network can indeed be bootstrapped from 
a random initial network, and can be robust to high rates 
of joins and departures of participating nodes. 

We chose Symphony, the 1-D Kleinberg routable 
small-world network[3, 4, 5] as the topology for the 
structured overlay network. This ringlike address space 
entails simple routing calculations and requires very low 
node state. Our structured overlay is currently the only 
implementation of a 1-D Kleinberg routable small- world 
network; as reviewed in the next section, a number of 
schemes that utilize the 1-D small- world model have 
been proposed, but to the best knowledge of the au- 
thors none have been deployed and tested in a WAN 
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environment. Kleinberg proved that properly desi 
small-world networks could support efficient decei 
ized routing with 0(log 2 N) latency. The proposed 
tem uses a 160-bit address space to construct a rin 
structure. Shortcuts are made in this ringlike ad< 
space according to a specific probability distributio 
The analysis and simulation results in [3] show 
maintaining k < log N long-range neighbors impi 
routing latency to 0(| log 2 N). 

Our functioning implementation adds several new 
tures to the mutable small-world model, including 
panded routing rules to permit firewall traversal and 
bootstrapping and also to obtain a structured 1-D 
starting from any initially connected network. Networks 
up to 1060 nodes have been deployed on PlanetLab, as 
we discuss later in Section 5. A key goal of this effort 
is that the network remains routable in the presence of 
massive node dynamics including massive joins, massive 
failures, ring merging and churn. The system's robust- 
ness under heavy node dynamics compares very favor- 
ably to the results published for Tapestry [7]; moreover, 
our deployment has more than twice the number of nodes 
dealt with in [7]. 

The paper is laid out as follows: we first discuss re- 
lated work in the following section. Section 3 describes 
the BruNet software architecture and system compo- 
nents. Section 3 also includes our approach to travers- 
ing firewalls and NAT devices. Section 4 provides de- 
tails on our Symphony implementation. Finally, Section 
5 presents PlanetLab experiments that demonstrate the 
correctness and robustness of the network. 

2 Related Work 

There has been much recent work on producing struc- 
tured P2P overlays with distributed hash table (DHT) 
interfaces. Some examples of these structured systems 
include [8, 9, 10, 11, 12, 13, 7, 14, 6, 3]. The main ad- 
vantages of these structured DHT systems are scalable 
object location in 0(log N) or 0(log 2 N) steps and the 
guaranteed retrieval of any existing object. 

This paper reports on an implementation and measure- 
ments rather than simulation of a P2P network. While 
there are many reports of simulations of structured P2P 
protocols, the measurement of such protocols in real 
world WAN environments has rarely been addressed 
(e.g. [7]). 

Among the existing structured systems, there are sev- 
eral Kleinberg-inspired small-world P2P overlays: Sym- 
phony [3] provides a detailed software design for a DHT 
system based on a unit-circumference ring; Accordion 
[15] is a proposed small-world-based structured system 
designed to provide efficient bandwidth management of 
the distributed routing tables; Mercury [16] presents a 




Figure 1: The structured ring permits efficient routing 
between nodes. This 200-node network was run on Plan- 
etLab. 

protocol for supporting multi-attribute range queries that 
layers on top of a small-world-based ring; SWAN [17] is 
an implemented multi-agent system based on the original 
2-D Kleinberg model [5]. Of the aforementioned small- 
world P2P systems, only SWAN has been implemented, 
while performance estimates for Symphony, Accordion, 
and Mercury are based solely on simulations. Therefore 
the presented system appears to be the first implemen- 
tation of the 1-D ring -based Kleinberg routable small- 
world network. 

3 BruNet System Architecture 

The BruNet P2P software framework is designed to al- 
low easy implementations of many different protocols. 
The software is implemented in the C# programming 
language using the Mono compiler and virtual machine 
on GNU/Linux based systems. This section provides a 
general overview of the basic primitives of the system, 
namely nodes, addresses, edges, routers and connection 
overlords. 

3.1 Nodes and Addressing 

The active elements in the system are called nodes. Each 
node can send packets, receive packets, and route pack- 
ets. A particular computer system, such as a desktop PC 
or a server system may host one or more nodes. The 
node is envisioned as an agent for a user or software 
application. Each node has exactly one address, which 
uniquely identifies that node on the network. Addition- 
ally, each node maintains several edges and uses these 
edges to pass packets to neighboring nodes. 

When a node is the destination of a packet, the node 
informs the user, or a higher-layer software application, 
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of the packet. The node also acts as a manager of its 
edges. 

The 160-bit address space consists of all the integers 
from to 2 160 — 1 and is partitioned into 161 distinct 
address classes. To determine the class of a particular 
address, count the number of consecutive bits of value 1 
on the rightmost part of the address. There can be be- 
tween and 160, and thus there are 161 address classes. 
Clearly, address class n is twice as large as n + 1. In 
fact, a class n address ends with exactly one bit of value 
followed by n bits of value 1 (except for class 160, for 
which all bits have the value 1). The size of the class n 
address space is 2 159_ ™ (except class 160, which has size 
1). To see that we have accounted for all the addresses, 
we can sum the size of each class and see that we get all 
2 160 addresses: 

159 

S = l + ^2 159 - fe 

fe=0 

i _ 9-I6O 

= 1 + 2 159 (1-A_) 
= 1 + 2 160 - 1 = 2 160 

So we see that if we count all classes from to 159 (and 
add 1 for class 160), we see that we get all 2 160 possible 
addresses. 

Address class-0 is the largest. We use class-0 to rep- 
resent addresses on the ring. These "ring" addresses are 
common to both the Chord[9] and Symphony[3] proto- 
cols. We describe the routing algorithm for these ad- 
dresses in Section 4. In addition to the ring addresses in 
class-0, we define class-124 as "directional" addresses. 
Directional addresses indicate that a packet should be 
routed in a particular direction on the ring such as clock- 
wise or counter-clockwise. Directional addresses are 
useful for communicating with nearby nodes on the ring 
as is often needed when joining the network or in DHT 
applications. 

Our system is designed to be a general framework for 
P2P applications. For example, one application of our 
system might be to use class- 1 addresses to represent hy- 
percube addresses such as those used in the Pastry P2P 
protocol [11]. This partitioning allows us to easily imple- 
ment new protocols without changing the packet format 
or core libraries. 

3.2 Packet Format 

All system packets begin with a byte that describes the 
type of data contained in the payload, followed by a 
payload. The first packet type is 0x01, which is used 
by nodes to establish connections and discover one an- 
other's BruNet system information. 



Header Field 


Start Position 


Length (bytes) 


Type 





1 


Hops 


1 


2 


TTL 


3 


2 


Source 


5 


20 


Destination 


25 


20 


Payload Type 


45 


1 



Table 1 : Packet format 



The second packet type is 0x02, which is used for 
the routed P2P protocols (this type is in contrast to type 
0x01 packets which are not routed on the overlay and are 
only used when two nodes are directly connecting to one 
another). In many respects, the routed P2P packets are 
similar to Ethernet packets but with a few notable differ- 
ences. Ethernet has 8 byte addresses where this system 
uses 20 byte addresses. Ethernet uses two bytes to denote 
the payload type, where we use only one. Unlike Ether- 
net packets, we do not need to include a checksum (since, 
as we discuss in section 3.3, we assume that the edges 
provide accurate packets). Also unlike Ethernet, we do 
need to include a field to indicate how far the packet has 
traveled and how far it is allowed to go. 

Packets may encapsulate many different types of pay- 
loads. For instance, nodes manage their position in the 
network by sending "network structure" packets to other 
nodes. Packets also transport what may be considered 
"application layer" data, such as queries for DHT or file- 
sharing applications. 

3.3 Edges and Connectivity 

In this work, we will say that a pair of nodes has an 
edge between them if they are communicating with one 
another by sending packets over a single overlay hop. 
Any underlying networking protocol which matches this 
requirement is a suitable transport. In fact, different 
edges may work over different transport protocols (such 
as TCP, UDP, etc.). 

Every edge must provide two things: 

• the edge must not pass corrupt packets 

• the edge must know the length of each packet it re- 
ceives. 

We identify endpoints of edges with transport ad- 
dresses, for instance BruNet.tcp: 192.168.0.1: 10030 to 
identify an endpoint of a TCP edge at IP address 
192.168.0.1 and port 10030. Generally, the transport ad- 
dress is a pair which contains the protocol and the ad- 
dressing information for that protocol. Currently, we 
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have implemented TCP and UDP edges, but in princi- 
ple we could also define an Ethernet edge to transport 
BruNet over Ethernet. 

Edges are typed with labels. For instance, in the Sym- 
phony protocol, there are edges which go to near neigh- 
bors on the ring and also shortcut connections that cut 
across the ring. The edges are labeled to distinguish 
them. Our framework allows edges to be labeled with 
any string, so a future protocol may be implemented 
which may define new edge labels. 

We assume that each node joins the network by con- 
tacting some node and forming a "leaf connection. The 
leaf connection is used for a newly joined node to boot- 
strap into its proper place in the network. The new node 
bootstraps by asking the node on the other end of the 
leaf connection to act as a proxy for any packets the new 
node would like to send or receive. Once a node has at 
least one leaf connection, it may use that connection to 
get more connections. There are two connection phases: 
making a connection request and the handshaking which 
goes on when two nodes are creating an edge between 
them. 

Consider the case of one node, which we will call the 
source, connecting with a second, which we will call the 
target. To create a new connection, the source sends a 
message to the target through the BruNet network. This 
message includes the BruNet address as well as a list 
of transport addresses corresponding to the source node. 
Once the target node receives the connection request, it 
sends a response which includes the same information 
about the target, namely the target's BruNet address and 
list of transport addresses. After sending the response, 
the target also attempts to create a new edge by using 
some networking transport to contact the source node. 
For instance, when the source node is using UDP, the tar- 
get node will send a UDP packet to the address given in 
the connection request. The target attempts to connect to 
the source using each item in the transport address list. 
If none of these attempts is successful the target gives 
up. On the other end of this exchange, the source node 
should receive both a response to its connection request 
and the new edge connection from the target. Assum- 
ing the transport layer is faster than the BruNet layer 
(which should be true since BruNet is an overlay on the 
transport), the source node should get the target's con- 
nection prior to receiving the response to the connection 
request. If for any reason (such as the existence of a fire- 
wall which we discuss in Section 3.4) the source does not 
get a connection from the target, when it receives the tar- 
get's connection message response, the source initiates a 
connection to the target. 

Assuming one or the other of the nodes is able to 
make a connection to the other, the nodes connect and ex- 
change several pieces of information, which we call the 



linking protocol. The first piece of information the nodes 
exchange is the local and remote transport addresses that 
each see as accurate for the connection. Due to network 
address translation (NAT), the two nodes may not agree 
on which IP addresses and port numbers they are each us- 
ing, but the information is exchanged so that each node 
can add this new transport address to their list of possible 
transport address endpoints that future nodes may use to 
connect to them. In addition to two peers' transport ad- 
dress information, each node exchanges a list of BruNet 
addresses (which are used for routing on the overlay) and 
transport addresses (which are used for making new con- 
nections) of nearby nodes. In our experience, getting 
connected, sending and receiving packets, and dealing 
with the errors that may occur during this process is the 
most complex aspect of the P2P system. As such it is 
very convenient to design this aspect of the system to be 
reusable by a wide variety of protocols. 

3.4 Firewalls 

Many nodes on the Internet today are behind a firewall or 
a network address translation (NAT) device. Such nodes 
present a challenge to P2P systems as it can be difficult 
for them to become connected to the network and to each 
other. As we discussed in Section 3.3, the BruNet con- 
nection process involves two steps: sending the connec- 
tion request followed by the linking protocol. 

When at least one node is not behind a NAT or a fire- 
wall, our standard connection protocol will result in the 
nodes forming a connection between them. Since our 
connection protocol involves first contacting the target 
over the BruNet network to exchange transport address 
information, both the target and the source have enough 
information to contact the other. So as long as one of the 
two parties is not behind a firewall, the connection will 
take place normally. 

When using UDP, our protocol allows two NATed 
and firewalled nodes to connect. As identified by the 
STUN [18] protocol, there are four types of NAT in use 
today: full cone, restricted cone, port restricted cone and 
symmetric. Like the STUN protocol, we only deal with 
the first three cases, and not with the symmetric NAT. Of 
the first three cases, the port restricted cone is the most 
restrictive; any protocol that works for the port restricted 
case works for the first two, so we describe how we deal 
with the port restricted cone NAT. 

A port restricted cone NAT performs a mapping 
from an internal network (IPi,porU) pair to an exter- 
nal (IP e ,port e ) pair. Consider a packet that arrives 
at the NAT with destination (IP ei port e ) and source 
(IP S , port s ). The NAT will only pass this packet if 
the internal node IPi has previously sent a packet with 
source (IPi,porti) to (IP s ,port s ). So, in order for two 
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nodes which are both behind a NAT to communicate, 
both nodes have to have previously sent a packet to the 
other's translated address. Fortunately, since our connec- 
tion protocol involves routing the transport address in- 
formation over the overlay, both nodes will get transport 
address information sufficient to contact the other. As- 
suming that both know their translated addresses, each 
will be send packets to the other's translated addresses. 
If the NATs are not symmetric, they will pass all pack- 
ets after the first. Our linking protocol involves using 
retries with back-off, thus the nodes will be able to send 
the necessary packets to open the connection through the 
NAT. The only issue that remains is how nodes learn their 
translated transport address. As covered in Section 3.3, 
part of our protocol is for each node to echo the transport 
address information it sees to its peer during connection. 
This allows each node to learn its translated address as- 
suming it can make at least one leaf connection to a node 
which is not behind a NAT. 

Our approach uses the same facts about common NAT 
devices as the STUN protocol except we use the P2P net- 
work instead of a central server to share the translated IP 
information. 

3.5 Routing and Connection Management 

Most P2P systems will have a great deal of overlap in 
the concepts we have discussed above, however signif- 
icant differences will emerge when it comes to routing 
of packets and the management of connections to peers. 
In the BruNet architecture, both routing and connection 
management are handled by components. 

To implement a new protocol, most of the existing 
BruNet system is reused, but a new router object must 
be defined and associated with the address class that will 
be used for that protocol. Additionally, each P2P pro- 
tocol may have different rules for maintaining connec- 
tions to peers including how many connections to main- 
tain and to which peer each node should be connected. 
Connection overlord objects encapsulate the code which 
manages the connections in the system. For instance, in 
the Symphony protocol, each node should have a con- 
nection to its left and right neighbors as well as at least 
one shortcut connection. We implemented a Symphony- 
ConnectionOverlord which counts the number of each 
of these types of connections, initiates new connections 
when needed, and closes connections that are no longer 
needed. 

BruNet was designed to implement unstructured as 
well as structured P2P protocols. Implementing unstruc- 
tured protocols, such as the Gnutella broadcast query 
protocol, is also easy. One need only define a new ad- 
dress class to represent broadcasts, implement a router to 
handle the routing of the broadcast messages and to build 



a routing table of known addresses, and finally a connec- 
tion overlord that makes sure that the node stays con- 
nected to the network as nodes come and go. The con- 
nection logic, transport abstraction, packetization, and 
serialization can all be reused between various imple- 
mentations. 

4 An Implementation of Symphony 

In the previous section we discussed the architecture of 
the BruNet P2P framework. In this section we describe 
our implementation of the Symphony 1-D small- world 
system. To implement a particular P2P system, we need 
to describe the routing and connection management, in- 
cluding joining and leaving, which we discuss in Sec- 
tions 4.1 and 4.2 respectively. 

We use class-0 addresses for this protocol. Thus, each 
node in the network can take one of 2 159 structured ad- 
dresses 1 . We interpret these addresses as even 160-bit 
integers in the range [0, 2 160 — 2] with this address space 
forming a ring. By convention, we say that the ring in- 
creases in the clockwise direction. 

4.1 Small World Routing 

The theory that supports structured routing comes from 
works on routable small-worlds [4, 5]. However, we in- 
troduce novel practical routing algorithms, which make 
network maintenance a natural consequence of those 
routing algorithms. As we discuss in Section 3.1, each 
node has an address that can be interpreted as a coor- 
dinate on a ring. As such, there is directionality (e.g. 
clockwise and counterclockwise). There are two mecha- 
nisms for routing on this structure: destination based and 
direction based. 

In direction based routing, we use fixed addresses 
(class- 124) to refer to "clockwise" and "counterclock- 
wise". When the packet's HOPS equal its TTL, the 
packet is delivered. By setting the TTL, a node can 
then communicate with its near-neighbors on the ring. 
This might have interesting applications for caching in 
DHT systems. Nodes maintain connections to at least 
two nearest nodes to them in both directions. This di- 
rection based routing is what enables a node to find its 
near-neighbors in order to connect to them. 

Destination based routing is slightly more complex. 
This mode of routing refers to the case where one node 
wants to address a second node by that second node's 
class-0 address, not based on its relative position on the 
ring. The simplest approach would be to route to the 
neighbor node which is closest to the destination, never 

'for randomly selected addresses, the network size will have to be 
fs 2 79 nodes before we are likely to reuse an address 
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Algorithm 1 Greedy NextHop(v, source, target): 
This algorithm describes how a packet arriving at v 
from source takes its next hop towards the target using 
greedy mode. Each hop tries to get closer (without 
visiting source ) to target. The adjacency list of node 
v is denoted Adj[v], and the distance between two nodes 
(a,b) in the network is DIST ring (a, b). 



DIST ring (v, target) 



for all u e Adj [v] do 

dtmp = DIST ring (u,target) 
if d tmp < d mm then 

dmin — dtmp 
Umin U 

end if 
end for 

if u min / nor u rnin ^ source then 

Deliver to u m i n 
else 

This is the last hop. Deliver locally to v. 
end if 



Algorithm 2 ExactNextHop(v, source, target): This 
algorithm describes how a packet arriving at v from 
source takes its next hop towards the target using ex- 
act mode. Each hop tries to get closer (without visiting 
source ) to target. The packet is delivered only to the 
target and no other node. The adjacency list of node v 
is denoted Adj[v], and the distance between two nodes 
(a,b) in the network is DIST r i ng (a, b). 

drain 

<= DIST ring {v, target) 

if v —— target then 

This is the last hop. Deliver locally to v. 
else 

for all u e Adj [v] do 

dtmp = DIST ring (u,target) 



if d 



tmp 



^min 



< d„ 

d, 



then 



tmp 



end if 
end for 

if Umin 7^ ^ ^ Umi 

Deliver to u m i n 
end if 
end if 



^ source then 



routing to a node that is further. This routing type is de- 
scribed in Algorithm 1. Clearly there can be no loops 
since each packet must get closer to the destination at 
each step. In some cases it may be desirable for a packet 
to only be delivered to the exact target class-0 address 
as shown in Algorithm 2. Kleinberg showed that the 
number of hops is 0(log 2 N) on average between any 
two nodes (when each node has 1 correctly distributed 
"shortcut" connection) [5, 4, 19]. If k < logiV "short- 
cuts" are maintained, the routing latency can be reduced 
to 0(j: log 2 N) hops. This result allows for a trade-off 
between node degree and routing latency. 

Algorithm 3 Annealing NextHop(v, source, target): 
This algorithm describes how a packet arriving at v from 
source takes its next hop towards the target using an- 
nealing mode. Each hop tries to get closer (without visit- 
ing source ) to target unless that is not possible in which 
case the packet is delivered to v and sent to the next clos- 
est node. The adjacency list of node v is denoted Adj[v], 
and the distance between two nodes (a,b) in the network 
is DIST ring {a, b). 

dmin 

<= DIST ring (v, target) 

dsec dmin 
Umin ^ ^ 
U sec <= V 

for all u e Adj [v] do 

d 



tmp — 

DIST ring (u,target) 

tmp 
tmp 
U'sec ^ 

else if dtmp < d m in then 



if dmin ^ dfuip <C dsec ttlCIl 
dsec ~ df 



sec — Umtn 
dmin — dtmp 
Umin ~ u 

end if 
end for 

II U m in 

^VOV Umi 

Deliver to u m in 
else 

Deliver locally to v 
Deliver to u sec 
end if 



^ source then 



In a real system there may be some problems to deal 
with. In particular, the ring may be broken by several 
nodes leaving at once. In that case, the ring becomes a 
line. If the line is not reconnected into a ring, a subse- 
quent failure could cause the line to split, which would 
break routability. As such, we add some exceptions to 
the simple routing discussed above which makes recon- 
necting the ring easier: namely, we do not require that 
the packet gets closer to its destination on its first hop as 
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described in Algorithm 3. 

4.2 Joining the Small- World 

In order for a node to join the ring, it makes use of Rout- 
ing Algorithm 3: annealing routing. The annealing rout- 
ing tolerates some disorder in the network. Every node 
that joins the ring must have a 160-bit class-0 address. 
This address must be randomly-generated to ensure the 
near uniform distribution of addresses on the ring; thus 
class-0 addresses are obtained by using a secure hash al- 
gorithm or some other source of random bits. After a 
node has a class-0 address, it must find its place in the 
ring. This means that it needs to make a connection to 
the closest node on both the right and left of its own ad- 
dress. Since the new node is not yet connected to the cor- 
rect place in the ring it is not yet able to route messages 
using the routing algorithms described above. The new 
node instead makes use of a node that is correctly placed 
in the ring as a proxy in order to find its place. The new 
node creates a special type of bootstrapping connection 
that does not support any of the routing algorithms above 
but does provide for packets to be sent to the node on the 
other end of the connection. This bootstrapping connec- 
tion allows the new node to communicate with the proxy 
in order to send and receive messages while it is waiting 
to find its place in the ring. The proxy sends a request 
to connect to the new address which is not yet in the net- 
work. Given the new node's absence, the closest node 
on the right and the closest node on left of the new node 
will form connections to the new joining node. At this 
point the new node is at the correct location in the ring 
and can add additional neighbors and shortcut connec- 
tions as needed. Algorithm 4 shows this process. 

Connection is not an instantaneous process. Our im- 
plementation uses two round trips: a link request and 
response, and a status request and response. The link 
messages exchange the node addresses, the IP addresses 
and port numbers, and whether the connection is a near- 
neighbor connection or shortcut connection. The status 
message allows the nodes to communicate some of their 
properties to their neighbors. In particular, the status 
message shares the node address and IP information of 
other nodes which are close to the new neighbor. This 
information allows nodes to verify that their views of the 
network are consistent and make repairs. 

In addition to neighbor connections, every node must 
also maintain k shortcut connections to other nodes that 
are far away in the address space. Specifically, the dis- 
tances traveled by all the shortcut connections in the 
structured ring must follow a probability distribution 
function (pdf) of the following form: p(d) ocl/d, where 
d denotes the distance traveled by the shortcut connec- 
tion [4, 5]. We use the local density of addresses to es- 



Algorithm 4 JoiningTheRing(v,u): This algorithm 
describes how a new node, denoted as v, joins the struc- 
tured ring. The proxy that helps v find its place in the 
network is called u. The class-0 address of a node is de- 
noted as ADD (node). ADD(v c ) is the closest address 
to ADD[v). PREV(v c , v) is the closest neighbor of v c 
in the direction of v. 

v makes a proxy connection to node u. 

v sends a connection request through u to ADD(v). 

u sends a connection request to ADD(v ). 

v c receives the request and connects to v. 

v sends a connection request to PREV(v c , v). 

PREV(v c , v) connects to v. 

v is now in the correct ring location. 



timate network size and thus d ave , the average distance 
between nodes. Then, we choose a random distance d 
between d ave and d max = 2 160 with probability propor- 
tional to 1 jd and connect to the node closest to that ad- 
dress using Routing Algorithm 1 (greedy routing). The 
method we use to select a proper distance is to define a 
random variable x distributed uniformly over [0, 1], and 
set: 

7 7 I d maX \ 

U> = &ave I ~j I 
\ d ave / 

From the above, we see that: 

Prob(d <L) = Prob (x < lo g L / rf °J \ 

\ log a maa; / dave / 

which is clearly the CDF for the random variable d to be 
distributed proportional to 1/d over (d ave ,d max ). This 
is repeated k times. The total cost in packets to join the 
network is 0(log 2 N), since we need to send O(k) pack- 
ets and each packet requires 0(i log 2 N) hops. 

5 PlanetLab Experiments 

This section describes the results of the reliability tests of 
the BruNet software. All of the experimental results on 
our implementation are performed using the global Plan- 
etLab test-bed. PlanetLab provides a realistic, WAN en- 
vironment to test distributed applications. In fact, Plan- 
etLab nodes are often highly loaded and represent a very 
challenging test environment. 

5.1 Experimental Methodology 

PlanetLab gives access to around 400 computers that are 
located in many countries around the world. There are 
dozens of research projects running simultaneously on 
the scarce computational resources provided by Planet- 
Lab. As a result, PlanetLab provides a measure of appli- 
cation performance on very adverse computational and 
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Figure 2: The 1-D Kleinberg small-world structure re- 
quires that the distances of the shortcut connections have 
apdfp(d) oc 1/d. In this PlanetLab experiment, we see 
that the cdf(d) follows the expected logarithmic distribu- 
tion for a network of size 1060. 

traffic load conditions. For the experiments presented in 
this section, around 100 PlanetLab machines were em- 
ployed. 

The current implementation is in C# using the Mono 
development platform. In order to minimize memory and 
other computational resource usage on PlanetLab ma- 
chines, we run multiple nodes inside a single Mono run- 
time process. As a result, many nodes can reside on a 
single machine. However, each node is executed on a 
separate thread and maintains its own connections and 
data. Furthermore since class-0 addresses are assigned 
randomly, nodes that reside on the same physical ma- 
chine are unlikely to be close to each other on the ad- 
dress space. We note that the UDP transport is used for 
all experiments presented in this section 2 . 

In our experiments, we wish to see that the structure of 
the network is correct, that the system can indeed route 
packets, and that the system is robust to node arrivals and 
failures. We analyze the logs of our experiments with 
a software tool which shares no code with the BruNet 
system itself. The metric we use to measure the robust- 
ness of the network is mutability. Routability of the net- 
work is defined as the fraction of pairs of nodes which 
can communicate using the standard (in this case greedy) 
routing algorithm. 

5.2 Structure Verification 

As discussed in Sections 3.1 and 4, all nodes are iden- 
tified by unique 160-bit addresses, which can be inter- 
preted as integers; nodes are arranged in a ring, with the 

2 We have verified that the system on a TCP transport delivers com- 
parable performance to UDP. 



convention that the integer representation of the node ad- 
dresses increase in the clockwise direction. Furthermore, 
our structured small-world routing network requires that 
each node keeps two neighbor connections to two closest 
class-0 addresses in the clockwise direction and counter- 
clockwise direction. In other words, the structured ring 
is correct if and only if the following is true: every node 
has connections to its first and second class-0 neighbors 
on the clockwise and the counterclockwise directions in 
the address space. 

We have successfully deployed a correct structured 
ring of size 1060 nodes on PlanetLab. It is difficult to 
see much in visualizations of such large graphs, however 
we present several figures for various sized networks in 
Figure 1 and Figures 9-11. 

We verified the correctness of the shortcut distance 
distribution by conducting the following: after the de- 
ployment of a correct 1060-node structured ring, all the 
shortcut connection distances are extracted from the ex- 
periment logs. The cumulative distribution function (cdf) 
of the shortcut distances is plotted in Figure 2. Note that 
the experimental cdf curve is in good agreement with the 
expected curve: cdf(d) oc log(<i). 

5.3 Churn 

Nodes do not stay in a P2P network indefinitely. One of 
the most striking aspects of the P2P network paradigm is 
that we assume that nodes are fundamentally faulty and 
will join and leave a network unexpectedly. Any real sys- 
tem must deal with unexpected arrivals and departures, 
which is called churn. 

A major question is: will a node complete the joining 
process correctly, in the presence of a slightly disordered 
network, before the node departs. There are two impor- 
tant time scales in the churn process: the mean round- 
trip-time (RTT) between the hosts at the IP layer, and the 
mean session time of the node. As the session time ap- 
proaches the RTT, clearly the system will not work prop- 
erly. Since each node requires two neighbor connections 
and at least one shortcut connection, the time required to 
establish the node will be much greater than the RTT. 

In our experiment, we created a correct network of 980 
nodes on PlanetLab. Once the network was correct, we 
then started the system churning for 25 minutes. Each 
second, with a fixed probability, every node abruptly 
goes offline, and then rejoins the network. This corre- 
sponds to an exponential distribution on session time. 

Figure 3 shows the results of our experiment. We 
find that when mean session time is above 12 minutes, 
the system is more than 99% routable, however as mean 
session time decreases to 5.7 minutes, we find that the 
system becomes significantly more disordered with a 
routability of 84%. Further decreasing the mean session 
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Figure 3: This experiment measures routability of a net- 
work of size 980 as a function the mean session time for 
each node. Once mean session time is above 10 minutes, 
the system has nearly perfect routability. 

time causes the system to fall apart and tend to very low 
values of routability. Exactly how the system transitions 
from highly routable to non-routable is very interesting, 
but is left to a future work. 

Our churn model is equivalent to Poisson arrival and 
departure processes: the number of nodes that depart 
in any interval is described by the Poisson distribution. 
Real systems do not exhibit Poissonian churn, but in- 
stead exhibit heavy-tailed distribution on session time: 
the median uptime is often low (a few minutes) but there 
are many nodes with very long uptime[20]. Simulations 
which have compared Poissonian churn to churn rates 
obtained from real P2P traces, have found that real traces 
are comparable to Poissonian churn with mean session 
times of around 100 minutes [21]. Thus, since our sys- 
tem can easily handle mean session times of 12 minutes, 
the system should perform very well in real environments 
with real loads. 

We note that cost of joining the network for Symphony 
is 0(log 2 N), and this cost comes into play when con- 
sidering churn resistance. We believe that P2P systems 
with lower joining costs should be more churn resistant. 
For instance, in Viceroy[12] joins cost O(logiV). Im- 
plementing Viceroy within our framework would not be 
difficult. 

5.4 Massive Joins and Failures 

One outstanding feature of this system is its ability to 
maintain a correct structure under diverse node dynam- 
ics including massive node insertions, massive node fail- 
ures and even the merging of two formerly disconnected 
rings. In Figure 4 we observe that nearly every pair of 
nodes in the network can communicate using structured 
routing even under adverse conditions such as massive 



node joins and failures. 

Given that the primary objective of the presented sys- 
tem is overlay routing, an important performance met- 
ric is the fraction of the pairs of nodes in the network 
that can communicate with each other; this is denoted as 
routability. To investigate how robust the system is to 
massive changes in network connectivity, we start with 
a completely routable, 460-node PlanetLab deployment 
and insert another 450 nodes into the network simulta- 
neously. This experiment is depicted in Figure 4. Less 
than one minute after the massive join the fraction of the 
network that is mutually routable falls to 0.65. Within 
another minute the fraction rebounds to 0.90. Within 11 
minutes of the massive join the entire 910-node network 
is routable. 

A similar experiment was presented by Tapestry [7] 
where a 325-node Tapestry network experiences a 60% 
massive join bringing the network size to about 525 
nodes. Prior to the massive join the routability was in 
the high 90% range but not 100% routable. Just af- 
ter the join the routability falls below 0.70 and then 
rebounds to about 0.95 within 10 minutes. However 
even after 60 minutes Tapestry is still only about 95% 
routable. Thus the presented system exhibits good ro- 
bustness compared to Tapestry under these failure con- 
ditions. It should be noted that Tapestry has published 
fault-correcting protocols [22] designed to improve ro- 
bustness under these types of node dynamics. These ad- 
ditional protocols from Tapestry have been tested in a 
LAN cluster but apparently not in a WAN environment 
such as PlanetLab. 

The system can also manage the merging of multiple 
disconnected structured rings into a single ring as seen in 
Figures 9-11. This merging experiment was conducted as 
follows: we deployed two separate networks of sizes 470 
and 499 respectively on PlanetLab; each network was to- 
tally unaware of the existence of the other network (i.e. 
they share no nodes in common); after both networks 
have formed correct rings, we deployed a single node 
that was connected to nodes in both networks; as a re- 
sult, the two previously disconnected rings were merged 
into a single ring of size 970. The time for the two correct 
rings to merge into a single large correct ring is approxi- 
mately 7 minutes. Figures 5-8 show an example of how 
the merging dynamics works. The exchange of neighbor 
lists in the connection protocol causes the two rings to 
be sewn together analogously to zipping the two halves 
of a zipper together. Based on this zipping action it is 
clear that it will take O(N) time for two rings to cor- 
rectly merge if there is a single contact point between the 
rings. 

As demonstrated by this ring merging experiment, net- 
works that have become split due to catastrophic outages 
can easily join back together. These findings indicate that 
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Figure 4: The network is very robust during gradual joins, massive joins and massive failures of nodes. After abrupt 
changes in connectivity, the network structure heals back to a perfect ring very rapidly and achieves overwhelming 
percentage routability long before the ring is completely correct. This demonstrates the applicability of the system to 
highly dynamic applications. Moreover, from examining the bottommost figure, one can observe that the number of 
missing edges in the network decreases exponentially fast in time after the massive join of 450 nodes. 




Figure 5: Two distinct 
routable rings denoted as 
Ring 1 and Ring 2 can 
be merged into a large 
routable ring. Here we de- 
pict Ring 1 merging with 
Ring 2. 




Figure 6: "C" connects to 
"B" and "D", the two clos- 
est nodes on Ring 2. As a 
normal part of the connec- 
tion protocol, "C" sends it 
neighbor lists to "B" and 
"D". 




Figure 7: Based on the 
neighbor-list information 
obtained from "C" while 
connecting, "B" connects 
to "A" and "D" connects 
to "E". 




Figure 8: The network is 
now correctly ordered but 
there are many more con- 
nections than are needed. 
Each node maintains k 
connections to the closest 
neighbors on the right and 
left (k = 1 in this exam- 
ple). Each node will trim 
the excess connections un- 
til only the k closest on 
each side remain. 
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Figure 11: The separate rings are merged together to 
form a single 970-node network on PlanetLab. The entire 



Figure 9: This network on PlanetLab has 499 nodes. 

merge process takes 7 minutes. 



the network will recover gracefully after major infras- 
tructure outages that fracture or disable large fractions of 
the underlying physical layer network. 



6 Conclusion 




Figure 10: This network on PlanetLab has 470 nodes. 



We present a new software framework for implement- 
ing P2P protocols. We use this framework to present the 
first 1-D implementation of the Kleinberg routable small- 
world model. We have shown that the C# implementa- 
tion produces networks that have the required topologi- 
cal structure To provide scalable structured small-world 
routing. The system is also very robust in the presence 
of large node dynamics including massive joins, massive 
failures, disconnected ring merges and churn. Given that 
this system is intended to provide overlay routing over 
heterogeneous physical layers and transport protocols, 
this robustness is critical to enabling reliable overlay ap- 
plications. 

We anticipate that this framework will be valuable to 
other researchers to allow them to implement new P2P 
routing and connection management protocols, without 
the need to reimplement solutions to common problems 
of node handshaking, packet sending and receiving, and 
abstraction of underlying transports, such as UDP and 
TCP. Future work will including using this framework to 
implement unstructured P2P protocols along with struc- 
tured P2P protocols. 
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Abstract 

We introduce BruNet, a general P2P software frame- 
work which we use to produce the first implementation 
of Symphony, a 1-D Kleinberg small- world architecture. 
Our framework is designed to easily implement and mea- 
sure different P2P protocols over different transport lay- 
ers such as TCP or UDP This paper discusses our im- 
plementation of the Symphony network, which allows 
each node to keep k < log N shortcut connections and 
to route to any other node with a short average delay 
of 0(j: log 2 N). We present experimental results taken 
from several PlanetLab deployments of size up to 1060 
nodes. These successful deployments represent some of 
the largest PlanetLab deployments of P2P overlays found 
in the literature, and show our implementation's robust- 
ness to massive node dynamics in a WAN environment. 

1 Introduction: Motivation and Summary 
of Results 

Peer-To-Peer (P2P) networking is an increasingly pop- 
ular network model where nodes communicate directly 
without utilizing a centralized server. In recent years, 
P2P file-sharing applications have flourished. A recent 
study shows that P2P systems are responsible for ap- 
proximately one half of the network traffic at a major 
university [?] and comprise a significant fraction of total 
Internet traffic. For a review of P2P search systems, see 
[?]. 

There are three novel contributions reported in this 
work. First, we describe a new P2P software framework 
called BruNet. The BruNet framework handles most of 
the issues common to all P2P protocols such as dealing 
with firewalls and NATs, connecting nodes, and routing 
packets. Secondly, we use the BruNet P2P framework 
to implement Symphony [?], a 1-D Kleinberg routable 
small- world network[?, ?]. This is the first implemen- 
tation of a 1-D routable small-world network. Third, we 
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report on large scale PlanetLab tests involving more than 
1000 nodes, which puts the P2P networks described here 
amongst the largest P2P networks to be tested on Planet- 
Lab. 

Our BruNet software architecture manages P2P packet 
routing and connection maintenance. Given a packet 
with a particular destination address A, the system will 
deliver the packet to the node closest to that address. 
This sort of routing primitive may be used to build a dis- 
tributed hash table (DHT), which is common in the P2P 
literature. Clearly, the success and efficacy of such an 
ad-hoc addressing and routing scheme depends on the 
robustness of the overlay structured networks. 

The deployment of DHT P2P systems such as the 
Kademlia-basedf?] eDonkey, which already supports 
about a million simultaneous users, indicates that large- 
scale overlay networks are feasible. The existence of 
such large-scale DHT systems is impressive, however 
the performance of P2P networks at that scale has not 
yet been systematically studied. While we have not yet 
scaled to one million nodes, our experiments of more 
than 1,000 nodes is amongst largest P2P networks to be 
tested on PlanetLab. The data we obtained from deploy- 
ments of our system on PlanetLab show that the struc- 
tured routing network can indeed be bootstrapped from 
a random initial network, and can be robust to high rates 
of joins and departures of participating nodes. 

We chose Symphony, the 1-D Kleinberg routable 
small-world network[?, ?, ?] as the topology for the 
structured overlay network. This ringlike address space 
entails simple routing calculations and requires very low 
node state. Our structured overlay is currently the only 
implementation of a 1-D Kleinberg routable small-world 
network; as reviewed in the next section, a number of 
schemes that utilize the 1-D small- world model have 
been proposed, but to the best knowledge of the au- 
thors none have been deployed and tested in a WAN 
environment. Kleinberg proved that properly designed 
small-world networks could support efficient decentral- 
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ized routing with 0(log 2 N) latency. The proposed 
tem uses a 160-bit address space to construct a rin 
structure. Shortcuts are made in this ringlike ad< 
space according to a specific probability distributio 
The analysis and simulation results in [?] show 
maintaining k < log N long-range neighbors impi 
routing latency to 0{ \ log 2 N). 

Our functioning implementation adds several new 
tures to the routable small-world model, includinj 
panded routing rules to permit firewall traversal and 
bootstrapping and also to obtain a structured 1-D 
starting from any initially connected network. Netv 
up to 1060 nodes have been deployed on PlanetLa , 
we discuss later in Section 5. A key goal of this effort 
is that the network remains routable in the presence of 
massive node dynamics including massive joins, massive 
failures, ring merging and churn. The system's robust- 
ness under heavy node dynamics compares very favor- 
ably to the results published for Tapestry [?]; moreover, 
our deployment has more than twice the number of nodes 
dealt with in [?]. 

The paper is laid out as follows: we first discuss re- 
lated work in the following section. Section 3 describes 
the BruNet software architecture and system compo- 
nents. Section 3 also includes our approach to travers- 
ing firewalls and NAT devices. Section 4 provides de- 
tails on our Symphony implementation. Finally, Section 
5 presents PlanetLab experiments that demonstrate the 
correctness and robustness of the network. 

2 Related Work 

There has been much recent work on producing struc- 
tured P2P overlays with distributed hash table (DHT) in- 
terfaces. Some examples of these structured systems in- 
clude [?, ?, ?, ?, ?, ?, ?, ?, ?, ?]. The main advantages of 
these structured DHT systems are scalable object loca- 
tion in 0(log N) or (9(log 2 N) steps and the guaranteed 
retrieval of any existing object. 

This paper reports on an implementation and measure- 
ments rather than simulation of a P2P network. While 
there are many reports of simulations of structured P2P 
protocols, the measurement of such protocols in real 
world WAN environments has rarely been addressed 
(e.g. [?]). 

Among the existing structured systems, there are sev- 
eral Kleinberg-inspired small-world P2P overlays: Sym- 
phony [?] provides a detailed software design for a DHT 
system based on a unit-circumference ring; Accordion 
[?] is a proposed small-world-based structured system 
designed to provide efficient bandwidth management of 
the distributed routing tables; Mercury [?] presents a 
protocol for supporting multi-attribute range queries that 
layers on top of a small-world-based ring; SWAN [?] is 




Figure 1: The structured ring permits efficient routing 
between nodes. This 200-node network was run on Plan- 
etLab. 

an implemented multi-agent system based on the original 
2-D Kleinberg model [?]. Of the aforementioned small- 
world P2P systems, only SWAN has been implemented, 
while performance estimates for Symphony, Accordion, 
and Mercury are based solely on simulations. Therefore 
the presented system appears to be the first implemen- 
tation of the 1-D ring -based Kleinberg routable small- 
world network. 

3 BruNet System Architecture 

The BruNet P2P software framework is designed to al- 
low easy implementations of many different protocols. 
The software is implemented in the C# programming 
language using the Mono compiler and virtual machine 
on GNU/Linux based systems. This section provides a 
general overview of the basic primitives of the system, 
namely nodes, addresses, edges, routers and connection 
overlords. 

3.1 Nodes and Addressing 

The active elements in the system are called nodes. Each 
node can send packets, receive packets, and route pack- 
ets. A particular computer system, such as a desktop PC 
or a server system may host one or more nodes. The 
node is envisioned as an agent for a user or software 
application. Each node has exactly one address, which 
uniquely identifies that node on the network. Addition- 
ally, each node maintains several edges and uses these 
edges to pass packets to neighboring nodes. 

When a node is the destination of a packet, the node 
informs the user, or a higher-layer software application, 
of the packet. The node also acts as a manager of its 
edges. 
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The 160-bit address space consists of all the integers 
from to 2 160 — 1 and is partitioned into 161 distinct 
address classes. To determine the class of a particular 
address, count the number of consecutive bits of value 1 
on the rightmost part of the address. There can be be- 
tween and 160, and thus there are 161 address classes. 
Clearly, address class n is twice as large as n + 1. In 
fact, a class n address ends with exactly one bit of value 
followed by n bits of value 1 (except for class 160, for 
which all bits have the value 1). The size of the class n 
address space is 2 159_ ™ (except class 160, which has size 
1). To see that we have accounted for all the addresses, 
we can sum the size of each class and see that we get all 
2 160 addresses: 

159 

S = l + ^2 159 - fe 

fe=0 

1 _ O-160 

= l + 2 159 (^ r ) 
= 1 + 2 160 - 1 = 2 160 

So we see that if we count all classes from to 159 (and 
add 1 for class 160), we see that we get all 2 160 possible 
addresses. 

Address class-0 is the largest. We use class-0 to rep- 
resent addresses on the ring. These "ring" addresses are 
common to both the Chord[?] and Symphony[?] proto- 
cols. We describe the routing algorithm for these ad- 
dresses in Section 4. In addition to the ring addresses in 
class-0, we define class-124 as "directional" addresses. 
Directional addresses indicate that a packet should be 
routed in a particular direction on the ring such as clock- 
wise or counter-clockwise. Directional addresses are 
useful for communicating with nearby nodes on the ring 
as is often needed when joining the network or in DHT 
applications. 

Our system is designed to be a general framework for 
P2P applications. For example, one application of our 
system might be to use class- 1 addresses to represent hy- 
percube addresses such as those used in the Pastry P2P 
protocol [?]. This partitioning allows us to easily imple- 
ment new protocols without changing the packet format 
or core libraries. 

3.2 Packet Format 

All system packets begin with a byte that describes the 
type of data contained in the payload, followed by a 
payload. The first packet type is 0x01, which is used 
by nodes to establish connections and discover one an- 
other's BruNet system information. 

The second packet type is 0x02, which is used for 
the routed P2P protocols (this type is in contrast to type 
0x01 packets which are not routed on the overlay and are 



Header Field 


Start Position 


Length (bytes) 


Type 





1 


Hops 


1 


2 


TTL 


3 


2 


Source 


5 


20 


Destination 


25 


20 


Payload Type 


45 


1 



Table 1 : Packet format 



only used when two nodes are directly connecting to one 
another). In many respects, the routed P2P packets are 
similar to Ethernet packets but with a few notable differ- 
ences. Ethernet has 8 byte addresses where this system 
uses 20 byte addresses. Ethernet uses two bytes to denote 
the payload type, where we use only one. Unlike Ether- 
net packets, we do not need to include a checksum (since, 
as we discuss in section 3.3, we assume that the edges 
provide accurate packets). Also unlike Ethernet, we do 
need to include a field to indicate how far the packet has 
traveled and how far it is allowed to go. 

Packets may encapsulate many different types of pay- 
loads. For instance, nodes manage their position in the 
network by sending "network structure" packets to other 
nodes. Packets also transport what may be considered 
"application layer" data, such as queries for DHT or file- 
sharing applications. 

3.3 Edges and Connectivity 

In this work, we will say that a pair of nodes has an 
edge between them if they are communicating with one 
another by sending packets over a single overlay hop. 
Any underlying networking protocol which matches this 
requirement is a suitable transport. In fact, different 
edges may work over different transport protocols (such 
as TCP, UDP, etc.). 

Every edge must provide two things: 

• the edge must not pass corrupt packets 

• the edge must know the length of each packet it re- 
ceives. 

We identify endpoints of edges with transport ad- 
dresses, for instance brunet.tcp: 192.168.0.1 : 10030 to 
identify an endpoint of a TCP edge at IP address 
192.168.0.1 and port 10030. Generally, the transport ad- 
dress is a pair which contains the protocol and the ad- 
dressing information for that protocol. Currently, we 
have implemented TCP and UDP edges, but in princi- 
ple we could also define an Ethernet edge to transport 
BruNet over Ethernet. 
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Edges are typed with labels. For instance, in the Sym- 
phony protocol, there are edges which go to near neigh- 
bors on the ring and also shortcut connections that cut 
across the ring. The edges are labeled to distinguish 
them. Our framework allows edges to be labeled with 
any string, so a future protocol may be implemented 
which may define new edge labels. 

We assume that each node joins the network by con- 
tacting some node and forming a "leaf connection. The 
leaf connection is used for newly joined node to boot- 
strap into its proper place in the network. The new node 
bootstraps by asking the node on the other end of the leaf 
connection to act as a proxy for any packets the new node 
would like to send or receive. Once a node has at least 
one leaf connection, it may use that connection to get 
more connections. There are two phases of connection: 
making a connection request and the handshaking which 
goes on when two nodes are creating an edge between 
them. 

Consider the case of one node, which we will call the 
source, connecting with a second, which we will call the 
target. To create a new connection, the source sends a 
message to the target through the BruNet network. This 
message includes the BruNet address as well as a list 
of transport addresses corresponding to the source node. 
Once the target node receives the connection request, it 
sends a response which includes the same information 
about the target, namely the target's BruNet address and 
list of transport addresses. After sending the response, 
the target also attempts to create a new edge by using 
some networking transport to contact the source node. 
For instance, when the source node is using UDP, the tar- 
get node will send a UDP packet to the address given in 
the connection request. The target attempts to connect to 
the source using each item in the transport address list. 
If none of these attempts is successful the target gives 
up. On the other end of this exchange, the source node 
should receive both a response to its connection request 
and the new edge connection from the target. Assum- 
ing the transport layer is faster than the BruNet layer 
(which should be true since BruNet is an overlay on the 
transport), the source node should get the target's con- 
nection prior to receiving the response to the connection 
request. If for any reason (such as the existence of a fire- 
wall which we discuss in Section 3.4) the source does not 
get a connection from the target, when it receives the tar- 
get's connection message response, the source initiates a 
connection to the target. 

Assuming one or the other of the nodes is able to make 
a connection to the other the nodes connect and exchange 
several pieces of information, which we call the linking 
protocol. The first piece of information the nodes ex- 
change is the local and remote transport addresses that 
each see as accurate for the connection. Due to network 



address translation (NAT), the two nodes may not agree 
on which IP addresses and port numbers they are each us- 
ing, but the information is exchanged so that each node 
can add this new transport address to their list of possible 
transport address endpoints that future nodes may use to 
connect to them. In addition to two peers' transport ad- 
dress information, each node exchanges a list of brunet 
addresses (which are used for routing on the overlay) and 
transport addresses (which are used for making new con- 
nections) of nearby nodes. In our experience, getting 
connected, sending and receiving packets, and dealing 
with the errors that may occur during this process is the 
most complex aspect of the P2P system. As such it is 
very convenient to design this aspect of the system to be 
reusable by a wide variety of protocols. 

3.4 Firewalls 

Many nodes on the Internet today are behind a firewall or 
a network address translation (NAT) device. Such nodes 
present a challenge to P2P systems as it can be difficult 
for them to become connected to the network and to each 
other. As we discussed in Section 3.3, the BruNet con- 
nection process involves two steps: sending the connec- 
tion request followed by the linking protocol. 

When at least one node is not behind a NAT or a fire- 
wall, our standard connection protocol will result in the 
nodes forming a connection between them. Since our 
connection protocol involves first contacting the target 
over the BruNet network to exchange transport address 
information, both the target and the source have enough 
information to contact the other. So as long as one of the 
two parties is not behind a firewall, the connection will 
take place normally. 

When using UDP, our protocol allows two NATed 
and firewalled nodes to connect. As identified by the 
STUN[?] protocol, there are four types of NAT in use 
today: full cone, restricted cone, port restricted code and 
symmetric. Like STUN protocol, we only deal with the 
first three cases, and not with the symmetric NAT. Of the 
first three cases, the port restricted cone is the most re- 
stricted; any protocol that works for the port restricted 
case works for the first two, so we describe how we deal 
with the port restricted cone NAT. 

A port restricted cone NAT performs a mapping 
from an internal network (IPi,porti) pair to an exter- 
nal (IP e ,port e ) pair. Consider a packet that arrives 
at the NAT with destination (IP e ,port e ) and source 
(IP S , port s ). The NAT will only pass this packet if 
the internal node IPi has previously sent a packet with 
source (IPi,porti) to (IP s ,port s ). So, in order for two 
nodes which are both behind a NAT to communicate, 
both nodes have to have previously sent a packet to the 
other's translated address. Fortunately, since ourconnec- 
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tion protocol involves routing the transport address infor- 
mation over the overlay, both nodes will get transport ad- 
dress information sufficient to contact the other. Assum- 
ing the both know their translated addresses, each will 
be send packets to the other's translated addresses. If the 
NATs are not symmetric, they will pass packets all pack- 
ets after the first. Our linking protocol involves using 
retries with back-off, thus the nodes will be able to send 
the necessary packets to open the connection through the 
NAT. The only issue that remains is how nodes learn their 
translated transport address. As covered in Section 3.3, 
part of our protocol is for each node to echo the transport 
address information it sees to its peer during connection. 
This allows each node to learn its translated address as- 
suming it can make at least one leaf connection to a node 
which is not behind a NAT. 

Our approach uses the same facts about common NAT 
devices as the STUN protocol except we use the P2P net- 
work instead of a central server to share the translated IP 
information. 

3.5 Routing and Connection Management 

Most P2P systems will have a great deal of overlap in 
the concepts we have discussed above, however signif- 
icant differences will emerge when it comes to routing 
of packets and the management of connections to peers. 
In the BruNet architecture, both routing and connection 
management are handled by components. 

To implement a new protocol, most of the existing 
BruNet system is reused, but a new router object must 
be defined and associated with the address class that will 
be used for that protocol. Additionally, each P2P pro- 
tocol may have different rules for maintaining connec- 
tions to peers including how many connections to main- 
tain and to which peer each node should be connected. 
Connection overlord objects encapsulate the code which 
manages the connections in the system. For instance, in 
the Symphony protocol, each node should have a con- 
nection to its left and right neighbors as well as at least 
one shortcut connection. We implemented a Symphony- 
ConnectionOverlord which counts the number of each 
of these types of connections, initiates new connections 
when needed, and closes connections that are no longer 
needed. 

BruNet was designed to implement unstructured as 
well as structured P2P protocols. Implementing unstruc- 
tured protocols, such as the Gnutella broadcast query 
protocol, is also easy. One need only define a new ad- 
dress class to represent broadcasts, implement a router 
to handle the routing of the broadcast messages and to 
build a routing table of known addresses, and finally a 
connection overlord that makes sure that the node stays 
connected to the network as nodes come and go. The 



connection logic, transport abstraction, packetizing, and 
serializing can all be reused between various implemen- 
tations. 

4 An Implementation of Symphony 

In the previous section we discussed the architecture of 
the BruNet P2P framework. In this section we describe 
our implementation of the Symphony 1-D small-world 
system. To implement a particular P2P system, we need 
to describe the routing and connection management, in- 
cluding joining and leaving, which we discuss in Sec- 
tions 4.1 and 4.2 respectively. 

We use class-0 addresses for this protocol. Thus, each 
node in the network can take one of 2 159 structured ad- 
dresses 1 . We interpret these addresses as even 160-bit 
integers in the range [0, 2 160 — 2] with this address space 
forming a ring. By convention, we say that the ring in- 
creases in the clockwise direction. 

4.1 Small World Routing 

The theory that supports structured routing comes from 
works on routable small-worlds [?, ?]. However, we in- 
troduce novel practical routing algorithms, which make 
network maintenance a natural consequence of those 
routing algorithms. As we discuss in Section 3.1, each 
node has an address that can be interpreted as a coor- 
dinate on a ring. As such, there is directionality (e.g. 
clockwise and counterclockwise). There are two mecha- 
nisms for routing on this structure: destination based and 
direction based. 

In direction based routing, we use fixed addresses 
(class- 124) to refer to "clockwise" and "counterclock- 
wise". When the packet's HOPS equal its TTL, the 
packet is delivered. By setting the TTL, a node can 
then communicate with its near-neighbors on the ring. 
This might have interesting applications for caching in 
DHT systems. Nodes maintain connections to at least 
two nearest nodes to them in both directions. This di- 
rection based routing is what enables a node to find its 
near-neighbors in order to connect to them. 

Destination based routing is slightly more complex. 
This mode of routing refers to the case where one node 
wants to address a second node by that second node's 
class-0 address, not based on its relative position on the 
ring. The simplest approach would be to route to the 
neighbor node which is closest to the destination, never 
routing to a node that is further. This routing type is de- 
scribed in Algorithm 1. Clearly there can be no loops 
since each packet must get closer to the destination at 

'for randomly selected addresses, the network size will have to be 
2 79 nodes before we are likely to reuse an address 
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Algorithm 1 Greedy NextHop(v, source, target): 
This algorithm describes how a packet arriving at v 
from source takes its next hop towards the target using 
greedy mode. Each hop tries to get closer (without 
visiting source ) to target. The adjacency list of node 
v is denoted Adj[v], and the distance between two nodes 
(a,b) in the network is DIST ring (a, b). 



DIST ring (v, target) 



for all u e Adj [v] do 

dtmp = DIST ring (u,target) 
if d tmp < d mm then 

dmin — dtmp 
Umin U 

end if 
end for 

if u min / nor u rnin ^ source then 

Deliver to u m i n 
else 

This is the last hop. Deliver locally to v. 
end if 



Algorithm 2 ExactNextHop(v, source, target): This 
algorithm describes how a packet arriving at v from 
source takes its next hop towards the target using ex- 
act mode. Each hop tries to get closer (without visiting 
source ) to target. The packet is delivered only to the 
target and no other node. The adjacency list of node v 
is denoted Adj[v], and the distance between two nodes 
(a,b) in the network is DIST r i ng (a, b). 

drain 

<= DIST ring {v, target) 

Umin ^ U 

if v —— target then 

This is the last hop. Deliver locally to v. 
else 

for all u e Adj [v] do 

dtmp = DIST ring (u,target) 



if d 



trap 



< d„ 

d, 



then 



imp 



end if 
end for 

if Umin 7^ U ^ Umi 

Deliver to u m i n 
end if 
end if 



^ source then 



each step. In some cases it may be desirable for a packet 
to only be delivered to the exact target class-0 address 
as shown in Algorithm 2. Kleinberg showed that the 
number of hops is 0(log 2 N) on average between any 
two nodes (when each node has 1 correctly distributed 
"shortcut" connection)!?, ?, ?]. If k < log TV "short- 
cuts" are maintained, the routing latency can be reduced 
to 0(i log 2 N) hops. This result allows for a trade-off 
between node degree and routing latency. 



Algorithm 3 Annealing NextHop(v, source, target): 
This algorithm describes how a packet arriving at v from 
source takes its next hop towards the target using an- 
nealing mode. Each hop tries to get closer (without visit- 
ing source ) to target unless that is not possible in which 
case the packet is delivered to v and sent to the next clos- 
est node. The adjacency list of node v is denoted Adj[v], 
and the distance between two nodes (a,b) in the network 



is DIST ring (a, b) 



d min <= DIST ring (v, target) 

dsec ^— dmin 
Umin ^ U 
Usee <= V 

for all u e Adj [v] do 

dtmp = DIST ring (u, target) 

if dmin ^ dfyyip <C d S ec then 

dcp.c. — d+ 



^sec — ^tmp 
U'sec ' H 

else if dtmp < d m 

dsec — d m in 
^sec — Umin 
dmin — dtmp 
Umin U 

end if 
end for 

if U m in ~/~ U OY Umin 

Deliver to u m in 
else 

Deliver locally to v 
Deliver to u sec 
end if 



then 



7^ source then 



In a real system there may be some problems to deal 
with. In particular, the ring may be broken by several 
nodes leaving at once. In that case, the ring becomes a 
line. If the line is not reconnected into a ring, a subse- 
quent failure could cause the line to split, which would 
break routability. As such, we add some exceptions to 
the simple routing discussed above which makes recon- 
necting the ring easier: namely, we do not require that 
the packet get closer to its destination on its first hop as 
described in Algorithm 3. 
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4.2 Joining the Small- World 

In order for a node to join the ring, it makes use of Rout- 
ing Algorithm 3: annealing routing. The annealing rout- 
ing tolerates some disorder in the network. Every node 
that joins the ring must have a 160-bit class-0 address. 
This address must be randomly-generated to ensure the 
near uniform distribution of addresses on the ring; thus 
class-0 address are obtained by using a secure hash al- 
gorithm or some other source of random bits. After a 
node has a class-0 address it must find its place in the 
ring. This means that it needs to make a connection to 
the closest node on both the right and left of its own ad- 
dress. Since the new node is not yet connected to the cor- 
rect place in the ring it is not yet able to route messages 
using the routing algorithms described above. The new 
node instead makes use of a node that is correctly placed 
in the ring as a proxy in order to find its place. The new 
node creates a special type of bootstrapping connection 
that does not support any of the routing algorithms above 
but does provide for packets to be sent to the node on the 
other end of the connection. This bootstrapping connec- 
tion allows the new node to communicate with the proxy 
in order to send and receive messages while it is waiting 
to find its place in the ring. The proxy sends a request 
to connect to the new address which is not yet in the net- 
work. Given the new node's absence, the closest node 
on the right and the closest node on left of the new node 
will form connections to the new joining node. At this 
point the new node is at the correct location in the ring 
and can add additional neighbors and shortcut connec- 
tions as needed. Algorithm 4 shows this process. 

Connection is not an instantaneous process. Our im- 
plementation uses two round trips: a link request and 
response, and a status request and response. The link 
messages exchange the node addresses, the IP addresses 
and port numbers, and whether the connection is a near- 
neighbor connection or shortcut connection. The status 
message allows the nodes to communicate some of their 
properties to their neighbors. In particular, the status 
message shares the node address and IP information of 
other nodes which are close to the new neighbor. This 
information allows nodes to verify that their views of the 
network are consistent and make repairs. 

In addition to neighbor connections, every node must 
also maintain k shortcut connections to other nodes that 
are far away in the address space. Specifically, the dis- 
tances traveled by all the shortcut connections in the 
structured ring must follow a probability distribution 
function (pdf) of the following form: p(d) oc 1/d, where 
d denotes the distance traveled by the shortcut connec- 
tion [?, ?]. We use the local density of addresses to es- 
timate network size and thus d ave , the average distance 
between nodes. Then, we choose a random distance d 



Algorithm 4 JoiningTheRing(v,u): This algorithm 
describes how a new node, denoted as v, joins the struc- 
tured ring. The proxy that helps v find its place in the 
network is called u. The class-0 address of a node is de- 
noted as ADD (node). ADD(v c ) is the closest address 
to ADD(v). PREV(v c , v) is the closest neighbor of v c 
in the direction of v. 

v makes a proxy connection to node u. 

v sends a connection request through u to ADD(v). 

u sends a connection request to ADD(v). 

v c receives the request and connects to v. 

v sends a connection request to PREV(v c , v). 

PREV(v c , v) connects to v. 

v is now in the correct ring location. 



between d ave and d max = 2 160 with probability propor- 
tional to 1/d and connect to the node closest to that ad- 
dress using Routing Algorithm 1 (greedy routing). The 
method we use to select a proper distance is to define a 
random variable x distributed uniformly over [0, 1], and 
set: 

7 7 [ d maX \ 

d dave I ~j I 
V ®ave J 

From the above, we see that: 

Prob(d <L) = Prob fx < ^gL/d \ 

\ log dmax I dave J 

which is clearly the CDF for the random variable d to be 
distributed proportional to 1/d over (d ave , d max ). This 
is repeated k times. The total cost in packets to join the 
network is 0(log 2 N), since we need to send 0(k) pack- 
ets and each packet requires 0(j log 2 N) hops. 

5 PlanetLab Experiments 

This section describes the results of the reliability tests of 
the BruNet software. All of the experimental results on 
our implementation are performed using the global Plan- 
etLab test-bed. PlanetLab provides a realistic, WAN en- 
vironment to test distributed applications. In fact, Plan- 
etLab nodes are often highly loaded and represent a very 
challenging test environment. 

5.1 Experimental Methodology 

PlanetLab gives access to around 400 computers that are 
located in many countries around the world. There are 
dozens of research projects running simultaneously on 
the scarce computational resources provided by Planet- 
Lab. As a result, PlanetLab provides a measure of appli- 
cation performance on very adverse computational and 
traffic load conditions. For the experiments presented in 
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Figure 2: The 1-D Kleinberg small-world structure re- 
quires that the distances of the shortcut connections have 
apdfp(d) oc 1/d. In this PlanetLab experiment, we see 
that the cdf(d) follows the expected logarithmic distribu- 
tion for a network of size 1060. 

this section, around 100 PlanetLab machines were em- 
ployed. 

The current implementation is in C# using the Mono 
development platform. In order to minimize memory and 
other computational resource usage on PlanetLab ma- 
chines, we run multiple nodes inside a single Mono run- 
time process. As a result, many nodes can reside on a 
single machine. However, each node is executed on a 
separate thread and maintains its own connections and 
data. Furthermore since class-0 addresses are assigned 
randomly, nodes that reside on the same physical ma- 
chine are unlikely to be close to each other on the ad- 
dress space. We note that the UDP transport is used for 
all experiments presented in this section 2 . 

In our experiments, we wish to see that the structure of 
the network is correct, that the system can indeed route 
packets, and that the system is robust to node arrivals and 
failures. We analyze the logs of our experiments with a 
software tool which shares no code with the BruNet sys- 
tem itself. The metric we use to measure the robustness 
of the network is routability. Routability of the network 
defined as the fraction of pairs of nodes which can com- 
municate using the standard (in this case greedy) routing 
algorithm. 

5.2 Structure Verification 

As discussed in Sections 3.1 and 4, all nodes are identi- 
fied by a unique 160-bit addresses, which can be inter- 
preted as integers; nodes are arranged in a ring, with the 
convention that the integer representation of the node ad- 
dresses increase in the clockwise direction. Furthermore, 

2 We have verified that the system on a TCP transport delivers com- 
parable performance to UDP. 



our structured small-world routing network requires that 
each node keeps two neighbor connections to two closest 
class-0 address in the clockwise direction and counter- 
clockwise direction. In other words, the structured ring 
is correct if and only if the following is true: every node 
has connections to its first and second class-0 neighbors 
on the clockwise and the counterclockwise directions in 
the address space. 

We have successfully deployed a correct structured 
ring of size 1060 nodes on PlanetLab. It is difficult to 
see much in visualizations of such large graphs, however 
we present several figures for various sized networks in 
Figure 1 and Figures 9-11. 

We verified the correctness of the shortcut distance 
distribution by conducting the following: After the de- 
ployment of a correct 1060-node structured ring, all the 
shortcut connection distances are extracted from the ex- 
periment logs. The cumulative distribution function (cdf) 
of the shortcut distances are plotted in Figure 2. Note that 
the experimental cdf curve is in good agreement with the 
expected curve: cdf(d) oc log(d). 

5.3 Churn 

Nodes do not stay in a P2P network indefinitely. One of 
the most striking aspects of the P2P network paradigm is 
that we assume that nodes are fundamentally faulty and 
will join and leave a network unexpectedly. Any real sys- 
tem must deal with unexpected arrivals and departures, 
which is called churn. 

A major question is: will a node complete the joining 
process correctly, in the presence of a slightly disordered 
network, before the node departs. There are two impor- 
tant time scales in the churn process: the mean round- 
trip-time (RTT) between the hosts at the IP layer, and the 
mean session time of the node. As the session time ap- 
proaches the RTT, clearly the system will not work prop- 
erly. Since each node requires two neighbor connections 
and at least one shortcut connection, the time required to 
establish the node will be much greater than the RTT. 

In our experiment, we created a correct network of 980 
nodes on PlanetLab. Once the network was correct, we 
then started the system churning for 25 minutes. Each 
second, with a fixed probability, every node abruptly 
goes offline, and then rejoins the network. This corre- 
sponds to an exponential distribution on session time. 

Figure 3 shows the results of our experiment. We 
find that when mean session time is above 12 minutes, 
the system is more than 99% mutable, however as mean 
session time decreases to 5.7 minutes, we find that the 
system becomes significantly more disordered with a 
routability of 84%. Further decreasing the mean session 
time causes the system to fall apart and tend to very low 
values of routability. Exactly how the system transistions 
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Figure 3: This experiment measures routability of a net- 
work of size 980 as a function the mean session time for 
each node. One mean session time is above 10 minutes, 
the system has nearly perfect routability. 

from highly routable to non-routable is very interesting, 
but is left to a future work. 

Our churn model is equivalent to Poisson arrival and 
departure processes: the number of nodes that depart 
in any interval is described by the Poisson distribution. 
Real systems do not exhibit Poissonian churn, but in- 
stead exhibit heavy-tailed distribution on session time: 
the median uptime is often low (a few minutes) but there 
are many nodes with very long uptime[?]. Simulations 
which have compared Poissonian churn to churn rates 
obtained from real P2P traces, have found that real traces 
are comparable to Poissonian churn with mean session 
times of around 100 minutes [?]. Thus, since our system 
can easily handle mean session times of 12 minutes, the 
system should perform very well in real environments 
with real loads. 

We note that cost of joining the network for Symphony 
is 0(log 2 N), and this cost comes into play when consid- 
ering churn resistance. We believe that P2P systems with 
lower joining costs should be more churn resistant. For 
instance, in Viceroy[?] joins cost 0(log N). Implement- 
ing Viceroy within our framework would not be difficult. 

5.4 Massive Joins and Failures 

One outstanding feature of this system is its ability to 
maintain a correct structure under diverse node dynam- 
ics including massive node insertions, massive node fail- 
ures and even the merging of two formerly disconnected 
rings. In Figure 4 we observe that nearly every pair of 
nodes in the network can communicate using structured 
routing even under adverse conditions such as massive 
node joins and failures. 

Given that the primary objective of the presented sys- 
tem is overlay routing, an important performance met- 



ric is the fraction of the pairs of nodes in the network 
that can communicate with each other; this is denoted as 
routability. To investigate how robust the system is to 
massive changes in network connectivity, we start with 
a completely routable, 460-node PlanetLab deployment 
and insert another 450 nodes into the network simulta- 
neously. This experiment is depicted in Figure 4. Less 
than one minute after the massive join the fraction of the 
network that is mutually routable falls to 0.65. Within 
another minute the fraction rebounds to 0.90. Within 11 
minutes of the massive join the entire 910-node network 
is routable. 

A similar experiment was presented by Tapestry [?] 
where a 325-node Tapestry network experiences a 60% 
massive join bringing the network size to about 525 
nodes. Prior to the massive join the routability was in 
the high 90% range but not 100% routable. Just af- 
ter the join the routability falls below 0.70 and then 
rebounds to about 0.95 within 10 minutes. However 
even after 60 minutes Tapestry is still only about 95% 
routable. Thus the presented system exhibits good ro- 
bustness compared to Tapestry under these failure con- 
ditions. It should be noted that Tapestry has published 
fault-correcting protocols [?] designed to improve robust- 
ness under these types of node dynamics. These addi- 
tional protocols from Tapestry have been tested in a LAN 
cluster but apparently not in a WAN environment such as 
PlanetLab. 

The system can also manage the merging of multi- 
ple disconnected structured rings into a single ring as 
seen in Figures 9-11. This merging experiment was con- 
ducted as followed: we deployed two separate networks 
of sizes 470 and 499 respectively on PlanetLab; each net- 
work was totally ignorant of the existence of the other 
network; after both networks have formed correct rings, 
we deployed a single node that was aware of nodes in 
both networks; as a result, the two previously discon- 
nected rings were merged into a single ring of size 970. 
The time for the two correct rings to merge into a single 
large correct ring is approximately 7 minutes. Figures 5- 
8 show an example of how the merging dynamics works. 
The exchange of neighbor lists in the connection proto- 
col causes the two rings to be sewn together analogously 
to zipping the two halves of a zipper together. Based on 
this zipping action it is clear that it will take O(N) time 
for two rings to correctly merge if there is a single con- 
tact point between the rings. 

As demonstrated by this ring merging experiment, net- 
works that have become split due to catastrophic outages 
can easily join back together. These findings indicate that 
the network will recover gracefully after major infras- 
tructure outages that fracture or disable large fractions of 
the underlying physical layer network. 
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Figure 4: The network is very robust during gradual joins, massive joins and massive failures of nodes. After abrupt 
changes in connectivity, the network structure heals back to a perfect ring very rapidly and achieves overwhelming 
percentage routability long before the ring is completely correct. This demonstrates the applicability of the system to 
highly dynamic applications. Moreover, from examining the bottommost figure, one can observe that the number of 
missing edges in the network decreases exponentially fast in time after the massive join of 450 nodes. 




Figure 5: Two distinct 
routable rings denoted as 
Ring 1 and Ring 2 can 
be merged into a large 
routable ring. Here we de- 
pict Ring 1 merging with 
Ring 2. 




Figure 6: "C" connects to 
"B" and "D", the two clos- 
est nodes on ring2. As a 
normal part of the connec- 
tion protocol, "C" sends it 
neighbor lists to "B" and 
"D". 




Figure 7: Based on the 
neighbor-list information 
obtained from "C" while 
connecting, "B" connects 
to "A" and "D" connects 
to "E". 




Figure 8: The network is 
now correctly ordered but 
there are many more con- 
nections than are needed. 
Each node maintains k 
connections to the closest 
neighbors on the right and 
left (k = 1 in this exam- 
ple). Each node will trim 
the excess connections un- 
til only the k closest on 
each side remain. 
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Figure 9: This network on PlanetLab has 499 nodes. 




6 Conclusion 

We present a new software framework for implement- 
ing P2P protocols. We use this framework to present the 
first 1-D implementation of the Kleinberg routable small- 
world model. We have shown that the C# implementa- 
tion produces networks that have the required topologi- 
cal structure to provide scalable structured small-world 
routing. The system is also very robust in the presence 
of large node dynamics including massive joins, massive 
failures, disconnected ring merges and churn. Given that 
this system is intended to provide overlay routing over 
heterogeneous physical layers and transport protocols, 
this robustness is critical to enabling reliable overlay ap- 
plications. 

We anticipate that this framework will be valuable to 
other researchers to allow them to implement new P2P 
routing and connection management protocols, without 
the need to reimplement solutions to common problems 
of node handshaking, packet sending and receiving, and 
abstraction of underlying transports, such as UDP and 
TCP. Future work will including using this framework to 
implement unstructured P2P protocols along with struc- 
tured P2P protocols. 



Figure 10: This network on PlanetLab has 470 nodes. 




Figure 1 1 : The separate rings are merged together to form a sin- 
gle 970-node network on PlanetLab. The entire merge process 
takes 7 minutes. 1 1 
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